Domain-Specific Hybrid Machine Translation from English to Portuguese

نویسندگان

  • João António Rodrigues
  • Luís Gomes
  • Steven Neale
  • Andreia Querido
  • Nuno Rendeiro
  • Sanja Stajner
  • João Ricardo Silva
  • António Branco
چکیده

Machine translation (MT) from English to Portuguese has not typically received much attention in existing research. In this paper, we focus on MT from English to Portuguese for the specific domain of information technology (IT), building a small in-domain parallel corpus to address the lack of IT-specific and publicly-available parallel corpora and then adapted an existing hybrid MT system to the new language pair (English to Portuguese). We further improved the initial version of the EN-PT hybrid system by adding various modules to address the most frequently occurring errors in the initial system. In order to assess the improvements achieved by each of these dedicated modules, we compared all versions of our MT system automatically. In addition, we conduct and report on a detailed error analysis of the initial and final versions of our system.

منابع مشابه

Machine Translation for Multilingual Troubleshooting in the IT Domain: A Comparison of Different Strategies

In this paper, we address the problem of machine translation (MT) of domain-specific texts for which large amounts of parallel data for training are not available. We focus on the IT domain and on English to Portuguese machine translation, and compare different strategies for improving system performance over two baselines, the first using only large dataset of out-of-domain data, and the secon...

متن کامل

Comparison of SYSTRAN and Google Translate for English→ Portuguese

Two machine translation (MT) systems, a statistical MT (SMT) system and a hybrid system (rule-based and SMT) were tested in order to compare various MT performances. The source language was English (EN) and the target language Portuguese (PT). The SMT tool gave much fewer errors than the hybrid system. Major problem areas of both systems concerned the transfer of verb systems from source to tar...

متن کامل

A Hybrid Model for Word Sense Disambiguation in English- Portuguese Machine Translation

We present the proposal for an approach to word sense disambiguation with application in machine translation from English to Brazilian Portuguese. This approach follows a hybrid natural language processing method, that is, a mixture of knowledge and corpus-based approaches. The main innovative feature is the formalism that we intend to use to represent the instances and the background knowledge...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

The Scielo Corpus: a Parallel Corpus of Scientific Publications for Biomedicine

The biomedical scientific literature is a rich source of information not only in the English language, for which it is more abundant, but also in other languages, such as Portuguese, Spanish and French. We present the first freely available parallel corpus of scientific publications for the biomedical domain. Documents from the ”Biological Sciences” and ”Health Sciences” categories were retriev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016